NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving

Lin, Yong; Tang, Shange; Lyu, Bohan; Wu, Jiayun; Lin, Hongzhou; Yang, Kaiyu; Li, Jia; Xia, Mengzhou; Chen, Danqi; Arora, Sanjeev; et al (October 2025, Conference on language modeling)

Full Text Available
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly

Yen, Howard; Gao, Tianyu; Hou, Minmin; Ding, Ke; Fleischer, Daniel; Izsak, Peter; Wasserblat, Moshe; Chen, Danqi (April 2025, International Conference on Learning Representations (ICLR))

Full Text Available
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Razin, Noam; Malladi, Sadhika; Bhaskar, Adithya; Chen, Danqi; Arora, Sanjeev; Hanin, Boris (January 2025, Proceedings of ICLR 2025)

Direct Preference Optimization (DPO) and its variants are increasingly used for aligning language models with human preferences. Although these methods are designed to teach a model to generate preferred responses more frequently relative to dispreferred responses, prior work has observed that the likelihood of preferred responses often decreases during training. The current work sheds light on the causes and implications of this counterintuitive phenomenon, which we term likelihood displacement. We demonstrate that likelihood displacement can be catastrophic, shifting probability mass from preferred responses to responses with an opposite meaning. As a simple example, training a model to prefer over can sharply increase the probability of . Moreover, when aligning the model to refuse unsafe prompts, we show that such displacement can unintentionally lead to unalignment, by shifting probability mass from preferred refusal responses to harmful responses (e.g., reducing the refusal rate of Llama-3-8B-Instruct from 74.4% to 33.4%). We theoretically characterize that likelihood displacement is driven by preferences that induce similar embeddings, as measured by a centered hidden embedding similarity (CHES) score. Empirically, the CHES score enables identifying which training samples contribute most to likelihood displacement in a given dataset. Filtering out these samples effectively mitigated unintentional unalignment in our experiments. More broadly, our results highlight the importance of curating data with sufficiently distinct preferences, for which we believe the CHES score may prove valuable
more » « less
Full Text Available
Long-Context Language Modeling with Parallel Context Encoding

https://doi.org/10.18653/v1/2024.acl-long.142

Yen, Howard; Gao, Tianyu; Chen, Danqi (August 2024, Association for Computational Linguistics)

Full Text Available
LitSearch: A Retrieval Benchmark for Scientific Literature Search

https://doi.org/10.18653/v1/2024.emnlp-main.840

Ajith, Anirudh; Xia, Mengzhou; Chevalier, Alexis; Goyal, Tanya; Chen, Danqi; Gao, Tianyu (November 2024, Association for Computational Linguistics)

Full Text Available
SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Xie, Tinghao; Qi, Xiangyu; Zeng, Yi; Huang, Yangsibo; Sehwag, Udari; Huang, Kaixuan; He, Luxi; Wei, Boyi; Li, Dacheng; Sheng, Ying; et al (January 2025, International Conference on Learning Representations (ICLR))

Full Text Available
Adapting Language Models to Compress Contexts

https://doi.org/10.18653/v1/2023.emnlp-main.232

Chevalier, Alexis; Wettig, Alexander; Ajith, Anirudh; Chen, Danqi (December 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)
Enabling Large Language Models to Generate Text with Citations

https://doi.org/10.18653/v1/2023.emnlp-main.398

Gao, Tianyu; Yen, Howard; Yu, Jiatong; Chen, Danqi (December 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)
MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions

https://doi.org/10.18653/v1/2023.emnlp-main.971

Zhong, Zexuan; Wu, Zhengxuan; Manning, Christopher; Potts, Christopher; Chen, Danqi (December 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)
Privacy Implications of Retrieval-Based Language Models

https://doi.org/10.18653/v1/2023.emnlp-main.921

Huang, Yangsibo; Gupta, Samyak; Zhong, Zexuan; Li, Kai; Chen, Danqi (December 2023, Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing)

« Prev Next »

Search for: All records